3,153 research outputs found
On the Provision of a Comprehensive Computer Graphics Education in the Context of Computer Games
Position paper for the ACM SIGGRAPH/Eurographics Computer Graphics Education Workshop 200
Model-Based Reinforcement Learning with Continuous States and Actions
Finding an optimal policy in a reinforcement learning (RL) framework with continuous state and action spaces is challenging. Approximate solutions are often inevitable. GPDP is an approximate dynamic programming algorithm based on Gaussian process (GP) models for the value functions. In this paper, we extend GPDP to the case of unknown transition dynamics. After building a GP model for the transition dynamics, we apply GPDP to this model and determine a continuous-valued policy in the entire state space. We apply the resulting controller to the underpowered pendulum swing up. Moreover, we compare our results on this RL task to a nearly optimal discrete DP solution in a fully known environment
Approximate Dynamic Programming with Gaussian Processes
In general, it is difficult to determine an optimal closed-loop policy in nonlinear control problems with continuous-valued state and control domains. Hence, approximations are often inevitable. The standard method of discretizing states and controls suffers from the curse of dimensionality and strongly depends on the chosen temporal sampling rate. In this paper, we introduce Gaussian process dynamic programming (GPDP) and determine an approximate globally optimal closed-loop policy. In GPDP, value functions in the Bellman recursion of the dynamic programming algorithm are modeled using Gaussian processes. GPDP returns an optimal statefeedback for a finite set of states. Based on these outcomes, we learn a possibly discontinuous closed-loop policy on the entire state space by switching between two independently trained Gaussian processes. A binary classifier selects one Gaussian process to predict the optimal control signal. We show that GPDP is able to yield an almost optimal solution to an LQ problem using few sample points. Moreover, we successfully apply GPDP to the underpowered pendulum swing up, a complex nonlinear control problem
Recommended from our members
An autoradiographic study of the projections from the lateral geniculate body of the rat.
The projections from the lateral geniculate body of the rat were followed using the technique of autoradiography after injections of [3H] proline into the dorsal and/or ventral nuclei of this diencephalic structure. Autoradiographs were prepared from either frozen or paraffin coronal sections through the rat brain. The dorsal nucleus of the lateral geniculate projected via the optic radiation to area 17 of the cerebral cortex. There was also a slight extension of label into the zones of transition between areas 17, 18 and 18a. The distribution of silver grains in the various layers of the cerebral cortex was analyzed quantitatively and showed a major peak of labeling in layer IV with minor peaks in outer layer I and the upper half and lowest part of layer VI. The significance of these peaks is discussed in respect to the distribution of geniculocortical terminals in other mammalian species. The ventral nucleus of the lateral geniculate body had 5 major projections to brain stem structures both ipsilateral and contralateral to the injected nucleus. There were two dorsomedial projections: (1) a projection to the superior colliculus which terminated mainly in the medial third of the stratum opticum, and (2) a large projection via the superior thalamic radiation which terminated in the ipsilateral pretectal area; a continuation of this projection passed through the posterior commissure to attain the contralateral pretectal area. The three ventromedial projections involved: (1) a geniculopontine tract which coursed through the basis pedunculi and the lateral lemniscus to terminate in the dorsomedial and dorsolateral parts of the pons after giving terminals to the lateral terminal nucleus of the accessory optic tract, (2) a projection via Meynert's commissure to the suprachiasmatic nuclei of both sides of the brain stem as well as to the contralateral ventral lateral geniculate nucleus and lateral terminal nucleus of the accessory optic tract, and (3) a medial projection to the ipsilateral zona incerta. The results obtained in these experiments are contrasted with other data on the rat's central visual connections to illustrate the importance of these connections in many subcortical visual functions
PIPPS: Flexible model-based policy search robust to the curse of chaos
Previously, the exploding gradient problem has
been explained to be central in deep learning and
model-based reinforcement learning, because it
causes numerical issues and instability in optimization.
Our experiments in model-based reinforcement
learning imply that the problem is not
just a numerical issue, but it may be caused by
a fundamental chaos-like nature of long chains
of nonlinear computations. Not only do the magnitudes
of the gradients become large, the direction
of the gradients becomes essentially random.
We show that reparameterization gradients suffer
from the problem, while likelihood ratio gradients
are robust. Using our insights, we develop
a model-based policy search framework, Probabilistic
Inference for Particle-Based Policy Search
(PIPPS), which is easily extensible, and allows
for almost arbitrary models and policies, while
simultaneously matching the performance of previous
data-efficient learning algorithms. Finally,
we invent the total propagation algorithm, which
efficiently computes a union over all pathwise
derivative depths during a single backwards pass,
automatically giving greater weight to estimators
with lower variance, sometimes improving over
reparameterization gradients by 10^6 times
On the Validity of Isotropic Complex α-Stable Interference Models for Interference in the IoT
International audienc
Manifold Gaussian Processes for regression
Off-the-shelf Gaussian Process (GP) covariance
functions encode smoothness assumptions on the structure
of the function to be modeled. To model complex and nondifferentiable
functions, these smoothness assumptions are often
too restrictive. One way to alleviate this limitation is to find
a different representation of the data by introducing a feature
space. This feature space is often learned in an unsupervised
way, which might lead to data representations that are not
useful for the overall regression task. In this paper, we propose
Manifold Gaussian Processes, a novel supervised method that
jointly learns a transformation of the data into a feature
space and a GP regression from the feature space to observed
space. The Manifold GP is a full GP and allows to learn data
representations, which are useful for the overall regression
task. As a proof-of-concept, we evaluate our approach on
complex non-smooth functions where standard GPs perform
poorly, such as step functions and robotics tasks with contacts.The research leading to these results has received funding
from the European Council under grant agreement
#600716 (CoDyCo - FP7/2007–2013). M. P. Deisenroth was
supported by a Google Faculty Research Award.This is the accepted manuscript. It is currently embargoed pending publication
Initial fixation placement in face images is driven by top-down guidance
The eyes are often inspected first and for longer period during face exploration. To examine whether this saliency of the eye region at the early stage of face inspection is attributed to its local structure properties or to the knowledge of its essence in facial communication, in this study we investigated the pattern of eye movements produced by rhesus monkeys (Macaca mulatta) as they free viewed images of monkey faces. Eye positions were recorded accurately using implanted eye coils, while images of original faces, faces with scrambled eyes, and scrambled faces except for the eyes were presented on a computer screen. The eye region in the scrambled faces attracted the same proportion of viewing time and fixations as it did in the original faces, even the scrambled eyes attracted substantial proportion of viewing time and fixations. Furthermore, the monkeys often made the first saccade towards to the location of the eyes regardless of image content. Our results suggest that the initial fixation placement in faces is driven predominantly by ‘top-down’ or internal factors, such as the prior knowledge of the location of “eyes” within the context of a face
Impulsive Multivariate Interference Models for IoT Networks
Device density in wireless internet of things (IoT) networks is now rapidly increasing and is expected to continue in the coming years. As a consequence, interference is a crucial limiting factor on network performance. This is true for all protocols operating on ISM bands (such as SigFox and LoRa) and licensed bands (such as NB-IoT). In this paper, with the aim of improving system design, we study the statistics of the interference due to devices in IoT networks; particularly those exploiting NB-IoT. Existing theoretical and experimental works have suggested that interference on each subband is well-modeled by impulsive noise, such as α-stable noise. If these devices operate on multiple partially overlapping resource blocks-which is an option standardized in NB-IoT-complex statistical dependence between interference on each subband is introduced. To characterize the multivariate statistics of interference on multiple subbands, we develop a new model based on copula theory and demonstrate that it effectively captures both the marginal α-stable model and the dependence structure induced by overlapping resource blocks. We also develop a low complexity estimation procedure tailored to our interference model, which means that the copula model can often be expressed in terms of standard network parameters without significant delays for calibration. We then apply our interference model in order to optimize receiver design, which provides a tractable means of outperforming existing methods for a wide range of network parameters
On the Validity of Isotropic Complex α-Stable Interference Models for Interference in the IoT
International audienc
- …